IEICE global.ieice.org Site

Keyword Search Result

[Keyword] neural network(855hit)

241-260hit(855hit)

Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit
Gaofeng CHENG Pengyuan ZHANG Ji XU

PAPER-Speech and Hearing

Pubricized:
2018/11/19
Vol:
E102-D No:2
Page(s):
355-363
The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.
A Low Cost Solution of Hand Gesture Recognition Using a Three-Dimensional Radar Array
Shengchang LAN Zonglong HE Weichu CHEN Kai YAO

PAPER-Sensing

Pubricized:
2018/08/21
Vol:
E102-B No:2
Page(s):
233-240
In order to provide an alternative solution of human machine interfaces, this paper proposed to recognize 10 human hand gestures regularly used in the consumer electronics controlling scenarios based on a three-dimensional radar array. This radar array was composed of three low cost 24GHz K-band Doppler CW (Continuous Wave) miniature I/Q (In-phase and Quadrature) transceiver sensors perpendicularly mounted to each other. Temporal and spectral analysis was performed to extract magnitude and phase features from six channels of I/Q signals. Two classifiers were proposed to implement the recognition. Firstly, a decision tree classifier performed a fast responsive recognition by using the supervised thresholds. To improve the recognition robustness, this paper further studied the recognition using a two layer CNN (Convolutional Neural Network) classifier with the frequency spectra as the inputs. Finally, the paper demonstrated the experiments and analysed the performances of the radar array respectively. Results showed that the proposed system could reach a high recognition accurate rate higher than 92%.
Image Watermarking Technique Using Embedder and Extractor Neural Networks
Ippei HAMAMOTO Masaki KAWAMURA

PAPER

Pubricized:
2018/10/19
Vol:
E102-D No:1
Page(s):
19-30
An autoencoder has the potential ability to compress and decompress information. In this work, we consider the process of generating a stego-image from an original image and watermarks as compression, and the process of recovering the original image and watermarks from the stego-image as decompression. We propose embedder and extractor neural networks based on the autoencoder. The embedder network learns mapping from the DCT coefficients of the original image and a watermark to those of the stego-image. The extractor network learns mapping from the DCT coefficients of the stego-image to the watermark. Once the proposed neural network has been trained, the network can embed and extract the watermark into unlearned test images. We investigated the relation between the number of neurons and network performance by computer simulations and found that the trained neural network could provide high-quality stego-images and watermarks with few errors. We also evaluated the robustness against JPEG compression and found that, when suitable parameters were used, the watermarks were extracted with an average BER lower than 0.01 and image quality over 35 dB when the quality factor Q was over 50. We also investigated how to represent the watermarks in the stego-image by our neural network. There are two possibilities: distributed representation and sparse representation. From the results of investigation into the output of the stego layer (3rd layer), we found that the distributed representation emerged at an early learning step and then sparse representation came out at a later step.
Event De-Noising Convolutional Neural Network for Detecting Malicious URL Sequences from Proxy Logs
Toshiki SHIBAHARA Kohei YAMANISHI Yuta TAKATA Daiki CHIBA Taiga HOKAGUCHI Mitsuaki AKIYAMA Takeshi YAGI Yuichi OHSITA Masayuki MURATA

PAPER-Cryptography and Information Security

Vol:
E101-A No:12
Page(s):
2149-2161
The number of infected hosts on enterprise networks has been increased by drive-by download attacks. In these attacks, users of compromised popular websites are redirected toward websites that exploit vulnerabilities of a browser and its plugins. To prevent damage, detection of infected hosts on the basis of proxy logs rather than blacklist-based filtering has started to be researched. This is because blacklists have become difficult to create due to the short lifetime of malicious domains and concealment of exploit code. To detect accesses to malicious websites from proxy logs, we propose a system for detecting malicious URL sequences on the basis of three key ideas: focusing on sequences of URLs that include artifacts of malicious redirections, designing new features related to software other than browsers, and generating new training data with data augmentation. To find an effective approach for classifying URL sequences, we compared three approaches: an individual-based approach, a convolutional neural network (CNN), and our new event de-noising CNN (EDCNN). Our EDCNN reduces the negative effects of benign URLs redirected from compromised websites included in malicious URL sequences. Evaluation results show that only our EDCNN with proposed features and data augmentation achieved a practical classification performance: a true positive rate of 99.1%, and a false positive rate of 3.4%.
Syntax-Based Context Representation for Statistical Machine Translation
Kehai CHEN Tiejun ZHAO Muyun YANG

PAPER-Natural Language Processing

Pubricized:
2018/08/24
Vol:
E101-D No:12
Page(s):
3226-3237
Learning semantic representation for translation context is beneficial to statistical machine translation (SMT). Previous efforts have focused on implicitly encoding syntactic and semantic knowledge in translation context by neural networks, which are weak in capturing explicit structural syntax information. In this paper, we propose a new neural network with a tree-based convolutional architecture to explicitly learn structural syntax information in translation context, thus improving translation prediction. Specifically, we first convert parallel sentences with source parse trees into syntax-based linear sequences based on a minimum syntax subtree algorithm, and then define a tree-based convolutional network over the linear sequences to learn syntax-based context representation and translation prediction jointly. To verify the effectiveness, the proposed model is integrated into phrase-based SMT. Experiments on large-scale Chinese-to-English and German-to-English translation tasks show that the proposed approach can achieve a substantial and significant improvement over several baseline systems.
Hidden Singer: Distinguishing Imitation Singers Based on Training with Only the Original Song
Hosung PARK Seungsoo NAM Eun Man CHOI Daeseon CHOI

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2018/08/24
Vol:
E101-D No:12
Page(s):
3092-3101
Hidden Singer is a television program in Korea. In the show, the original singer and four imitating singers sing a song in hiding behind a screen. The audience and TV viewers attempt to guess who the original singer is by listening to the singing voices. Usually, there are few correct answers from the audience, because the imitators are well trained and highly skilled. We propose a computerized system for distinguishing the original singer from the imitating singers. During the training phase, the system learns only the original singer's song because it is the one the audience has heard before. During the testing phase, the songs of five candidates are provided to the system and the system then determines the original singer. The system uses a 1-class authentication method, in which only a subject model is made. The subject model is used for measuring similarities between the candidate songs. In this problem, unlike other existing studies that require artist identification, we cannot utilize multi-class classifiers and supervised learning because songs of the imitators and the labels are not provided during the training phase. Therefore, we evaluate the performances of several 1-class learning algorithms to choose which one is more efficient in distinguishing an original singer from among highly skilled imitators. The experiment results show that the proposed system using the autoencoder performs better (63.33%) than other 1-class learning algorithms: Gaussian mixture model (GMM) (50%) and one class support vector machines (OCSVM) (26.67%). We also conduct a human contest to compare the performance of the proposed system with human perception. The accuracy of the proposed system is found to be better (63.33%) than the average accuracy of human perception (33.48%).
Empirical Evaluation and Optimization of Hardware-Trojan Classification for Gate-Level Netlists Based on Multi-Layer Neural Networks
Kento HASEGAWA Masao YANAGISAWA Nozomu TOGAWA

LETTER

Vol:
E101-A No:12
Page(s):
2320-2326
Recently, it has been reported that malicious third-party IC vendors often insert hardware Trojans into their products. Especially in IC design step, malicious third-party vendors can easily insert hardware Trojans in their products and thus we have to detect them efficiently. In this paper, we propose a machine-learning-based hardware-Trojan detection method for gate-level netlists using multi-layer neural networks. First, we extract 11 Trojan-net feature values for each net in a netlist. After that, we classify the nets in an unknown netlist into a set of Trojan nets and that of normal nets using multi-layer neural networks. By experimentally optimizing the structure of multi-layer neural networks, we can obtain an average of 84.8% true positive rate and an average of 70.1% true negative rate while we can obtain 100% true positive rate in some of the benchmarks, which outperforms the existing methods in most of the cases.
A Two-Stage Crack Detection Method for Concrete Bridges Using Convolutional Neural Networks
Yundong LI Weigang ZHAO Xueyan ZHANG Qichen ZHOU

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2018/09/05
Vol:
E101-D No:12
Page(s):
3249-3252
Crack detection is a vital task to maintain a bridge's health and safety condition. Traditional computer-vision based methods easily suffer from disturbance of noise and clutters for a real bridge inspection. To address this limitation, we propose a two-stage crack detection approach based on Convolutional Neural Networks (CNN) in this letter. A predictor of small receptive field is exploited in the first detection stage, while another predictor of large receptive field is used to refine the detection results in the second stage. Benefiting from data fusion of confidence maps produced by both predictors, our method can predict the probability belongs to cracked areas of each pixel accurately. Experimental results show that the proposed method is superior to an up-to-date method on real concrete surface images.
A Spectrum Sensing Algorithm for OFDM Signal Based on Deep Learning and Covariance Matrix Graph
Mengbo ZHANG Lunwen WANG Yanqing FENG Haibo YIN

PAPER-Wireless Communication Technologies

Pubricized:
2018/05/30
Vol:
E101-B No:12
Page(s):
2435-2444
Spectrum sensing is the first task performed by cognitive radio (CR) networks. In this paper we propose a spectrum sensing algorithm for orthogonal frequency division multiplex (OFDM) signal based on deep learning and covariance matrix graph. The advantage of deep learning in image processing is applied to the spectrum sensing of OFDM signals. We start by building the spectrum sensing model of OFDM signal, and then analyze structural characteristics of covariance matrix (CM). Once CM has been normalized and transformed into a gray level representation, the gray scale map of covariance matrix (GSM-CM) is established. Then, the convolutional neural network (CNN) is designed based on the LeNet-5 network, which is used to learn the training data to obtain more abstract features hierarchically. Finally, the test data is input into the trained spectrum sensing network model, based on which spectrum sensing of OFDM signals is completed. Simulation results show that this method can complete the spectrum sensing task by taking advantage of the GSM-CM model, which has better spectrum sensing performance for OFDM signals under low SNR than existing methods.
Deep Convolutional Neural Networks for Manga Show-Through Cancellation
Taku NAKAHARA Kazunori URUMA Tomohiro TAKAHASHI Toshihiro FURUKAWA

LETTER-Image Processing and Video Processing

Pubricized:
2018/08/02
Vol:
E101-D No:11
Page(s):
2844-2848
Recently, the demand for the digitization of manga is increased. Then, in the case of an old manga where the original pictures have been lost, we have to digitize it from comics. However, the show-through phenomenon would be caused by scanning of the comics since it is represented as the double sided images. This letter proposes the manga show-through cancellation method based on the deep convolutional neural network (CNN). Numerical results show that the effectiveness of the proposed method.
High-Performance Super-Resolution via Patch-Based Deep Neural Network for Real-Time Implementation
Reo AOKI Kousuke IMAMURA Akihiro HIRANO Yoshio MATSUDA

PAPER-Image Processing and Video Processing

Pubricized:
2018/08/20
Vol:
E101-D No:11
Page(s):
2808-2817
Recently, Super-resolution convolutional neural network (SRCNN) is widely known as a state of the art method for achieving single-image super resolution. However, performance problems such as jaggy and ringing artifacts exist in SRCNN. Moreover, in order to realize a real-time upconverting system for high-resolution video streams such as 4K/8K 60 fps, problems such as processing delay and implementation cost remain. In the present paper, we propose high-performance super-resolution via patch-based deep neural network (SR-PDNN) rather than a convolutional neural network (CNN). Despite the very simple end-to-end learning system, the SR-PDNN achieves higher performance than the conventional CNN-based approach. In addition, this system is suitable for ultra-low-delay video processing by hardware implementation using an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
Air-Writing Recognition Based on Fusion Network for Learning Spatial and Temporal Features
Buntueng YANA Takao ONOYE

PAPER-Neural Networks and Bioengineering

Vol:
E101-A No:11
Page(s):
1737-1744
A fusion framework between CNN and RNN is proposed dedicatedly for air-writing recognition. By modeling the air-writing using both spatial and temporal features, the proposed network can learn more information than existing techniques. Performance of the proposed network is evaluated by using the alphabet and numeric datasets in the public database namely the 6DMG. Average accuracy of the proposed fusion network outperforms other techniques, i.e. 99.25% and 99.83% are observed in the alphabet gesture and the numeric gesture, respectively. Simplified structure of RNN is also proposed, which can attain about two folds speed-up of ordinary BLSTM network. It is also confirmed that only the distance between consecutive sampling points is enough to attain high recognition performance.
Standard-Compliant Multiple Description Image Coding Based on Convolutional Neural Networks
Ting ZHANG Huihui BAI Mengmeng ZHANG Yao ZHAO

LETTER-Image Processing and Video Processing

Pubricized:
2018/07/19
Vol:
E101-D No:10
Page(s):
2543-2546
Multiple description (MD) coding is an attractive framework for robust information transmission over non-prioritized and unpredictable networks. In this paper, a novel MD image coding scheme is proposed based on convolutional neural networks (CNNs), which aims to improve the reconstructed quality of side and central decoders. For this purpose initially, a given image is encoded into two independent descriptions by sub-sampling. Such a design can make the proposed method compatible with the existing image coding standards. At the decoder, in order to achieve high-quality of side and central image reconstruction, three CNNs, including two side decoder sub-networks and one central decoder sub-network, are adopted into an end-to-end reconstruction framework. Experimental results show the improvement achieved by the proposed scheme in terms of both peak signal-to-noise ratio values and subjective quality. The proposed method demonstrates better rate central and side distortion performance.
Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields
Mayu OTANI Atsushi NISHIDA Yuta NAKASHIMA Tomokazu SATO Naokazu YOKOYA

PAPER-Image Recognition, Computer Vision

Pubricized:
2018/07/20
Vol:
E101-D No:10
Page(s):
2509-2517
Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.
Advanced Ensemble Adversarial Example on Unknown Deep Neural Network Classifiers
Hyun KWON Yongchul KIM Ki-Woong PARK Hyunsoo YOON Daeseon CHOI

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2018/07/06
Vol:
E101-D No:10
Page(s):
2485-2500
Deep neural networks (DNNs) are widely used in many applications such as image, voice, and pattern recognition. However, it has recently been shown that a DNN can be vulnerable to a small distortion in images that humans cannot distinguish. This type of attack is known as an adversarial example and is a significant threat to deep learning systems. The unknown-target-oriented generalized adversarial example that can deceive most DNN classifiers is even more threatening. We propose a generalized adversarial example attack method that can effectively attack unknown classifiers by using a hierarchical ensemble method. Our proposed scheme creates advanced ensemble adversarial examples to achieve reasonable attack success rates for unknown classifiers. Our experiment results show that the proposed method can achieve attack success rates for an unknown classifier of up to 9.25% and 18.94% higher on MNIST data and 4.1% and 13% higher on CIFAR10 data compared with the previous ensemble method and the conventional baseline method, respectively.
TS-ICNN: Time Sequence-Based Interval Convolutional Neural Networks for Human Action Detection and Recognition
Zhendong ZHUANG Yang XUE

LETTER-Human-computer Interaction

Pubricized:
2018/07/20
Vol:
E101-D No:10
Page(s):
2534-2538
The research on inertial sensor based human action detection and recognition (HADR) is a new area in machine learning. We propose a novel time sequence based interval convolutional neutral networks framework for HADR by combining interesting interval proposals generator and interval-based classifier. Experiments demonstrate the good performance of our method.
Dynamic Fixed-Point Design of Neuromorphic Computing Systems
Yongshin KANG Jaeyong CHUNG

BRIEF PAPER-Microwaves, Millimeter-Waves

Vol:
E101-C No:10
Page(s):
840-844
Practical deep neural networks have a number of weight parameters, and the dynamic fixed-point formats have been used to represent them efficiently. The dynamic fixed-point representations share an scaling factor among a group of numbers, and the weights in a layer have been formed into such a group. In this paper, we first explore a design space for dynamic fixed-point neuromorphic computing systems and show that it is indispensable to have a small group size in neuromorphic architectures, because it is appropriate to group the weights associated with a neuron into a group. We then presents a dynamic fixed-point representation designed for neuromorphic computing systems. Our experimental results show that the proposed representation reduces the required weight bitwidth by about 4 bits compared to the conventional fixed-point format.
A Unified Neural Network for Quality Estimation of Machine Translation
Maoxi LI Qingyu XIANG Zhiming CHEN Mingwen WANG

LETTER-Natural Language Processing

Pubricized:
2018/06/18
Vol:
E101-D No:9
Page(s):
2417-2421
The-state-of-the-art neural quality estimation (QE) of machine translation model consists of two sub-networks that are tuned separately, a bidirectional recurrent neural network (RNN) encoder-decoder trained for neural machine translation, called the predictor, and an RNN trained for sentence-level QE tasks, called the estimator. We propose to combine the two sub-networks into a whole neural network, called the unified neural network. When training, the bidirectional RNN encoder-decoder are initialized and pre-trained with the bilingual parallel corpus, and then, the networks are trained jointly to minimize the mean absolute error over the QE training samples. Compared with the predictor and estimator approach, the use of a unified neural network helps to train the parameters of the neural networks that are more suitable for the QE task. Experimental results on the benchmark data set of the WMT17 sentence-level QE shared task show that the proposed unified neural network approach consistently outperforms the predictor and estimator approach and significantly outperforms the other baseline QE approaches.
A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats
Hang CUI Shoichi HIRASAWA Hiroaki KOBAYASHI Hiroyuki TAKIZAWA

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2018/06/13
Vol:
E101-D No:9
Page(s):
2307-2314
Sparse Matrix-Vector multiplication (SpMV) is a computational kernel widely used in many applications. Because of the importance, many different implementations have been proposed to accelerate this computational kernel. The performance characteristics of those SpMV implementations are quite different, and it is basically difficult to select the implementation that has the best performance for a given sparse matrix without performance profiling. One existing approach to the SpMV best-code selection problem is by using manually-predefined features and a machine learning model for the selection. However, it is generally hard to manually define features that can perfectly express the characteristics of the original sparse matrix necessary for the code selection. Besides, some information loss would happen by using this approach. This paper hence presents an effective deep learning mechanism for SpMV code selection best suited for a given sparse matrix. Instead of using manually-predefined features of a sparse matrix, a feature image and a deep learning network are used to map each sparse matrix to the implementation, which is expected to have the best performance, in advance of the execution. The benefits of using the proposed mechanism are discussed by calculating the prediction accuracy and the performance. According to the evaluation, the proposed mechanism can select an optimal or suboptimal implementation for an unseen sparse matrix in the test data set in most cases. These results demonstrate that, by using deep learning, a whole sparse matrix can be used to do the best implementation prediction, and the prediction accuracy achieved by the proposed mechanism is higher than that of using predefined features.
Computational Power of Threshold Circuits of Energy at most Two
Hiroki MANIWA Takayuki OKI Akira SUZUKI Kei UCHIZAWA Xiao ZHOU

PAPER

Vol:
E101-A No:9
Page(s):
1431-1439
The energy of a threshold circuit C is defined to be the maximum number of gates outputting ones for an input assignment, where the maximum is taken over all the input assignments. In this paper, we study computational power of threshold circuits of energy at most two. We present several results showing that the computational power of threshold circuits of energy one and the counterpart of energy two are remarkably different. In particular, we give an explicit function which requires an exponential size for threshold circuits of energy one, but is computable by a threshold circuit of size just two and energy two. We also consider MOD functions and Generalized Inner Product functions, and show that these functions also require exponential size for threshold circuits of energy one, but are computable by threshold circuits of substantially less size and energy two.

241-260hit(855hit)

Keyword Search Result

[Keyword] neural network(855hit)

Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit

A Low Cost Solution of Hand Gesture Recognition Using a Three-Dimensional Radar Array

Image Watermarking Technique Using Embedder and Extractor Neural Networks

Event De-Noising Convolutional Neural Network for Detecting Malicious URL Sequences from Proxy Logs

Syntax-Based Context Representation for Statistical Machine Translation

Hidden Singer: Distinguishing Imitation Singers Based on Training with Only the Original Song

Empirical Evaluation and Optimization of Hardware-Trojan Classification for Gate-Level Netlists Based on Multi-Layer Neural Networks

A Two-Stage Crack Detection Method for Concrete Bridges Using Convolutional Neural Networks

A Spectrum Sensing Algorithm for OFDM Signal Based on Deep Learning and Covariance Matrix Graph

Deep Convolutional Neural Networks for Manga Show-Through Cancellation

High-Performance Super-Resolution via Patch-Based Deep Neural Network for Real-Time Implementation

Air-Writing Recognition Based on Fusion Network for Learning Spatial and Temporal Features

Standard-Compliant Multiple Description Image Coding Based on Convolutional Neural Networks

Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields

Advanced Ensemble Adversarial Example on Unknown Deep Neural Network Classifiers

TS-ICNN: Time Sequence-Based Interval Convolutional Neural Networks for Human Action Detection and Recognition

Dynamic Fixed-Point Design of Neuromorphic Computing Systems

A Unified Neural Network for Quality Estimation of Machine Translation

A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats

Computational Power of Threshold Circuits of Energy at most Two

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles